Data visualization




Brian S. Evans, Ph.D.
Migratory Bird Center
Smithsonian Conservation Biology Institute



Today’s goals


Setup


Get the data:

library(tidyverse)

gitUrl <-
  'https://raw.githubusercontent.com/bsevansunc/'

courseData <-
  'smsc_data_science/master/data/'

Setup


Get the data:

birdMeasures <- 
  read_csv(
    paste0(
      gitUrl,
      courseData,
      'birdMeasures.csv'))

Initiating a plot


ggplot(birdMeasures)

Aesthetics


Aesthetics describe mapping the value of some variable to an observable feature.

ggplot(
  birdMeasures, 
  aes(x = spp))

Geometries


A geometry plot element provides a visible representation of observations. They are called using the function geom_[geometry]. Geometries are frequently used include:

  • geom_bar: Bars for bar plots
  • geom_histogram: Histogram plot for observing distributions
  • geom_density: Density plot for observing distributions
  • geom_point: Point plot for observing raw data

Geometries


ggplot(
  birdMeasures, 
  aes(x = spp)) +
  geom_bar()

Geometries


Piping helps!

birdMeasures %>%
  ggplot(aes(x = spp)) +
  geom_bar()

Geometries


Piping helps!

birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar()

Exercise One:


The function geom_density can be used to display the density distribution of a vector. Using the aesthetic x = mass, display the distribution of Black-capped and Carolina chickadee mass measurements:

Exercise One:


The function geom_density can be used to display the density distribution of a vector. Using the aesthetic x = mass, display the distribution of Black-capped and Carolina chickadee mass measurements:

# Subset birdCounts to BCCH and CACH and plot density:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_density()

Exercise One:



Geometries: Adding arguments


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram()

Geometries: Adding arguments


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(binwidth = 1)

Geometries: Adding arguments


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(bins = 20)

Geometries: Adding arguments


birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar(fill = 'gray')

Geometries: Adding arguments


birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar(
    fill = 'gray', 
    color = 'black')

Geometries: Adding arguments



Geometries: Adding arguments


birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar(fill = 'gray', 
           color = 'black',
           size = 0.7)

Geometries: Adding arguments



Exercise Two:


Modify your density plot from Exercise One:

  • Use the fill argument to fill your density shape with the color “gray”:
  • The argument alpha can be applied to a geometry to adjust its transparency. Adjust the density shape to alpha = 0.7

Exercise Two:


# Subset birdCounts to BCCH and CACH and plot density:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_density()

Exercise Two:


Geometries: Adding aesthetics


Aesthetics describe mapping the value of some variable to an observable feature.

birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar(aes(fill = region))

Geometries: Adding aesthetics


Aesthetics describe mapping the value of some variable to an observable feature.

Geometries: Adding aesthetics


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(
    aes(fill = sex),
    bins = 20)

Geometries: Adding aesthetics



Geometries: Adding aesthetics


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(
    aes(fill = sex),
    bins = 20,
    color = 'black')

Geometries: Adding aesthetics



Exercise Three:


Modify your density plot from Exercise Two. Use the fill argument of the function geom_density to assign a different fill color to females and males.

Exercise Three:


# Subset birdCounts to BCCH and CACH and plot density:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_density(
    aes(fill = sex),
    alpha = 0.7)

Exercise Three:


Facets


Faceting splits plots, by some variable, into multiple plots.

Facets


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(
    aes(fill = sex), 
    bins = 20,
    color = 'black') +
  facet_wrap(~spp)

Facets



Facets


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(
    aes(fill = sex), 
    bins = 20,
    color = 'black') +
  facet_wrap(~spp, nrow = 2)

Facets



Exercise Four:


Modify your density plot from Exercise Three. Use the facet_wrap function with the argument nrow = 2 to generate separate plots of Black-capped and Carolina chickadees.

Exercise Four:


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_density(
    aes(fill = sex),
    alpha = 0.7) +
  facet_wrap(~spp, nrow = 2)

Exercise Four:


Labels


Labels describes the plot and axis titles.

Labels


birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar(
    aes(fill = region),
    color = 'black',
    size = .7) +
  labs(
    title = 'Birds banded and recaptured 2000-2017',
    x = 'Species',
    y = 'Count')

Labels


Labels


Piping can be used …

birdMeasures %>%
  filter(spp != 'NOCA') %>%
  mutate(
    spp = factor(
      spp,
      labels = c(
        'American robin',
        'Black-capped chickadee',
        'Carolina chickadee',
        'Gray catbird'
      )
    )) %>%
  ggplot(aes(x = spp)) +
  geom_bar(
    aes(fill = region),
    color = 'black',
    size = .7) +
  labs(
    title = 'Birds banded and recaptured 2000-2017',
    x = 'Species',
    y = 'Count')

Labels


Labels


It’s a good time to assign names!

birdCaptures_basicPlot <- 
  birdMeasures %>%
  filter(spp != 'NOCA') %>%
  mutate(spp = factor(
    spp,
    labels = c(
      'American robin',
      'Black-capped chickadee',
      'Carolina chickadee',
      'Gray catbird'
    )
  )) %>%
  ggplot(aes(x = spp)) +
  geom_bar(
    aes(fill = region),
    color = 'black',
    size = .7) +
  labs(
    title = 'Birds banded and recaptured 2000-2017',
    x = 'Species',
    y = 'Count')

Labels


Exercise Five:


Modify the density plot you created in Exercise Four:

  • Using the piping method, change the names “BCCH” and “CACH” to “Black-capped” and “Carolina”
  • Add the title “Mass of Carolina and Black-capped chickadees and capitalize the x and y axis titles
  • Assign the name “massDensity” to the plot

Exercise Five:


# Labels for massDensity:

massDensity <- 
  birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  mutate(spp = factor(
    spp,
    labels = c(
      'Black-capped',
      'Carolina'
    )
  )) %>%
  ggplot(aes(x = mass)) +
  geom_density(
    aes(fill = sex),
    alpha = 0.7) +
  facet_wrap(~spp, nrow = 2) +
  labs(
    title = "Mass of Carolina and Black-capped chickadees",
    x = 'Mass', 
    y = 'Density')

massDensity

Exercise Five:


Scaling axes


Changing the scale of an axis changes the range of numbers and the names and locations of tick marks.

Scaling axes


birdCaptures_basicPlot +
  scale_y_continuous(expand = c(0,0))

Scaling axes


Scaling axes


birdMeasures %>%
  filter(spp != 'NOCA') %>%
  group_by(spp) %>%
  summarize(n = n())
## # A tibble: 4 x 2
##   spp       n
##   <chr> <int>
## 1 AMRO    671
## 2 BCCH    508
## 3 CACH    797
## 4 GRCA   1395

Scaling axes


birdCaptures_basicPlot +
  scale_y_continuous(
    expand = c(0, 0),
    limits = c(0, 1500))

Scaling axes


Scaling axes


birdCaptures_basicPlot +
  scale_y_continuous(
    expand = c(0, 0),
    limits = c(0, 1500),
    breaks = seq(0, 1500, by = 250))

Scaling axes


Exercise Six:


Plot massDensity. Use the expand, limits, and breaks arguments of the function scale_y_continuous to scale the y-axis such that the scale ranges from 0 to 0.7 and breaks occur at intervals of 0.1.

Exercise Six:


massDensity +
  scale_y_continuous(
    expand = c(0, 0),
    limits = c(0, 0.8),
    breaks = seq(0, 0.8, by = 0.1))

Exercise Six:


Colors


The default colors of ggplot are pretty ugly. Luckily you can modify in an infinite number of ways!

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(
    aes(fill = sex), 
    bins = 20,
    color = 'black') +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(values = c('blue', 'red'))

Colors


Colors


Color-picker apps can be a great way to find colors that you like on the internet.

Colors


Using Team Zissou’s hat and shirt color:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(
    aes(fill = sex), 
    bins = 20,
    color = 'black') +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(values = c('#9EB8C5', '#F32017'))

Colors


Colors


You can hunt around to find colors that you like and then save your palette for use later:

zPalette <- c('#9EB8C5', '#F32017')

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(
    aes(fill = sex), 
    bins = 20,
    color = 'black') +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(values = zPalette)

Colors


Exercise Seven:


Modify the density plot you created in Exercise Six. Use scale_fill_manual to set custom fill colors.

Exercise Seven:


# Colors for massDensity:

massDensity +
  scale_y_continuous(
    expand = c(0, 0),
    limits = c(0, 0.8),
    breaks = seq(0, 0.8, by = 0.1)) +
  scale_fill_manual(values =  c('#9EB8C5', '#F32017'))

Exercise Seven:


Legends


Legends can be modified in a number of ways. One method to do so is to modify the data frame coming into the plotting functions:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  mutate(
    sex = factor(
      sex,
      labels = c('Female','Male'))) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(
    aes(fill = sex), 
    bins = 20,
    color = 'black') +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(values = c('#9EB8C5', '#F32017'))

Legends


Legends


We can also use the scale_fill_manual function from above to modify the legend by specifying the name and label attributes:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(
    aes(fill = sex), 
    bins = 20,
    color = 'black') +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(
    values = c('#9EB8C5', '#F32017'), 
    name = 'Sex', 
    labels = c('Female', 'Male'))

Legends


Exercise Eight:


Modify the density plot you created in Exercise Seven. Use scale_fill_manual to set the legend title and labels.

Exercise Eight:


# Colors for massDensity:
massDensity +
  scale_y_continuous(
    expand = c(0, 0),
    limits = c(0, 0.8),
    breaks = seq(0, 0.8, by = 0.1)) +
  scale_fill_manual(
    values = zPalette,
    name = 'Sex', 
    labels = c('Female', 'Male')) 

Exercise Eight:


Themes


A theme describes many of the visual elements of a plot.

Themes are controlled by elements, including:

  • element_blank: A blank element
  • element_rect: A rectangle element
  • element_text: A text element
  • element_line: A line element

Themes


Before exploring themes, let’s take a moment to assign names to the current versions of our plots:

histogram2Theme <- 
  birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  mutate(spp = factor(
    spp,
    labels = c(
      'Black-capped',
      'Carolina'
    )
  )) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(
    aes(fill = sex), 
    bins = 20,
    color = 'black') +
  scale_y_continuous(
    expand = c(0, 0),
    limits = c(0, 150),
    breaks = seq(0, 150, by = 25)) +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(
    values = c('#9EB8C5', '#F32017'), 
    name = 'Sex', 
    labels = c('Female', 'Male'))

Themes


Before exploring themes, let’s take a moment to assign names to the current versions of our plots:

density2Theme <- massDensity +
  scale_y_continuous(
    expand = c(0, 0),
    limits = c(0, 0.8),
    breaks = seq(0, 0.8, by = 0.1)) +
  scale_fill_manual(
    values = zPalette,
    name = 'Sex', 
    labels = c('Female', 'Male')) 

Themes


Remove gray panel background using element_rect:

histogram2Theme +
  labs(
    title = 'Mass of Carolina and Black-capped chickadees',
    x = 'Mass',
    y = 'Density',
    fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white')
  )

Themes


Themes


Change panel lines using element_line:

histogram2Theme +
  labs(
    title = 'Mass of Carolina and Black-capped chickadees',
    x = 'Mass',
    y = 'Density',
    fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
  )

Themes


Themes


Modify the strip background using element_rect:

histogram2Theme +
  labs(
    title = 'Mass of Carolina and Black-capped chickadees',
    x = 'Mass',
    y = 'Density',
    fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    strip.background = element_rect(fill = 'white')
  )

Themes


Themes


Modify the y axis lines using element_line:

histogram2Theme +
  labs(
    title = 'Mass of Carolina and Black-capped chickadees',
    x = 'Mass',
    y = 'Density',
    fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white')
  )

Themes


Themes


Remove the legend title using element_blank:

histogram2Theme +
  labs(
    title = 'Mass of Carolina and Black-capped chickadees',
    x = 'Mass',
    y = 'Density',
    fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank()
  )

Themes


Themes


Change the size of tick mark text using axis.text and element_text:

histogram2Theme +
  labs(
    title = 'Mass of Carolina and Black-capped chickadees',
    x = 'Mass',
    y = 'Density',
    fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank(),
    axis.text = element_text(size = 12)
  )

Themes


Themes


Make the axis titles bigger we use axis.title and element_text:

histogram2Theme +
  labs(
    title = 'Mass of Carolina and Black-capped chickadees',
    x = 'Mass',
    y = 'Density',
    fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 18)
  )

Themes


Themes


Make the facet labels bigger we use axis.title and element_text:

histogram2Theme +
  labs(
    title = 'Mass of Carolina and Black-capped chickadees',
    x = 'Mass',
    y = 'Density',
    fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 18),
    strip.text = element_text(size = 18)
  )

Themes


Themes


Make the plot title larger using plot.title and element_text:

histogram2Theme +
  labs(
    title = 'Mass of Carolina and Black-capped chickadees',
    x = 'Mass',
    y = 'Density',
    fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 18),
    strip.text = element_text(size = 18),
    plot.title = element_text(size = 22)
  )

Themes


Themes


Add a margin between the plot and title (see ?margin):

massPlot +
  labs(
    title = 'Mass of Carolina and\nBlack-capped chickadees',
    x = 'Mass',
    y = 'Density',
    fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 18),
    strip.text = element_text(size = 18),
    plot.title = element_text(size = 22, margin = margin(b = 40))
  )

Themes


Exercise Nine:


Make your density plot as pretty as possible using themes!